Skip to content

feat(tasks): richer task generation with recursive decomposition (#420)#444

Merged
frankbria merged 5 commits into
mainfrom
feature/issue-420-richer-task-generation
Mar 15, 2026
Merged

feat(tasks): richer task generation with recursive decomposition (#420)#444
frankbria merged 5 commits into
mainfrom
feature/issue-420-richer-task-generation

Conversation

@frankbria

@frankbria frankbria commented Mar 15, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #420

  • Rich metadata in task generation: Enhanced LLM prompt produces tasks with complexity_score, estimated_hours, uncertainty_level, depends_on, and files_to_modify — fields that existed in the model but were never populated
  • Recursive decomposition engine: classify_task()decompose_task()generate_task_tree() breaks PRDs into hierarchical atomic/composite task trees (inspired by tinyagi/fractals)
  • Tree-structure fields: Added parent_id, lineage, is_leaf, hierarchical_id to Task model and DB schema
  • Status propagation: Children done → parent auto-completes; child failed → parent fails
  • CLI commands: cf tasks generate --recursive [--max-depth N] and cf tasks tree (ASCII hierarchy display)
  • Lineage context: Task ancestry included in agent prompts via ContextPackager

Files Changed

Area Files What
Model/Schema core/tasks.py, core/workspace.py 4 new Task fields + DB columns
Rich Generation core/tasks.py Enhanced LLM prompt, dependency resolution
Tree Engine core/task_tree.py (NEW) classify, decompose, generate_tree, flatten, display, propagate
Context core/context_packager.py Lineage injection in agent prompts
CLI cli/app.py --recursive, --max-depth flags, tree command
Tests 5 test files 51 new tests

Test plan

  • 51 new tests across 5 test files
  • All 2357 v2 tests pass (0 regressions)
  • Ruff lint clean
  • Demo cf tasks generate with rich metadata
  • Demo cf tasks tree with hierarchical display
  • Demo backward compatibility (generate without --recursive)

Summary by CodeRabbit

  • New Features

    • Recursive task decomposition with configurable max depth and CLI flags
    • New CLI command to display task hierarchy as an ASCII tree
    • Tasks gain parent/child relationships, hierarchical IDs, lineage, and richer metadata (complexity, estimated hours, uncertainty)
    • Lineage context is included in prompts to improve decomposition
  • Tests

    • Extensive tests for recursive decomposition, tree display, schema fields, status propagation, and rich metadata generation

Test User added 4 commits March 14, 2026 20:14
Add parent_id, lineage, is_leaf, and hierarchical_id fields to support
hierarchical task decomposition. Includes DB schema, migrations, and
CRUD operations with full backward compatibility.
Update the LLM prompt to request complexity, estimated_hours, uncertainty,
depends_on_titles, and files_to_modify. Parse and validate these fields
with clamping/defaults. Resolve title-based dependencies to task IDs
post-creation.
…ns (#420)

Implement Steps 3 and 4 of issue #420: task_tree module with LLM-powered
classify/decompose/generate_task_tree for recursive decomposition, plus
flatten_task_tree, display_task_tree, and propagate_status for tree
management. 25 tests covering all functions.
…ommand (#420 steps 5-6)

- Add --recursive/-r and --max-depth options to `cf tasks generate` for
  recursive task decomposition via generate_task_tree/flatten_task_tree
- Add `cf tasks tree` command to display task hierarchy as ASCII tree
- Add lineage context to TaskContextPackager.build() prompt when task
  has lineage data, appearing before verification gates
- 11 new tests: 7 CLI tests + 4 context packager lineage tests
@coderabbitai

coderabbitai Bot commented Mar 15, 2026

Copy link
Copy Markdown
Contributor

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ffff2449-cc0b-4559-8d55-7027fe76e702

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd347a and 1250ded.

📒 Files selected for processing (1)
  • tests/core/test_rich_task_generation.py

Walkthrough

Adds recursive task decomposition and tree management via a new task_tree module, extends Task schema with tree fields, integrates lineage into prompts, adds CLI flags for recursive generation and a tasks_tree command, and includes schema migrations plus comprehensive tests for tree generation, flattening, display, and propagation.

Changes

Cohort / File(s) Summary
CLI Commands
codeframe/cli/app.py
Added --recursive (-r) and --max-depth options to tasks_generate; branching to invoke generate_task_tree + flatten_task_tree when recursive, preserve existing LLM/no-LLM paths otherwise. Added new tasks_tree command that calls display_task_tree and handles errors.
Task Tree Module
codeframe/core/task_tree.py
New module implementing classify_task, decompose_task, generate_task_tree, _parse_subtasks, flatten_task_tree, display_task_tree, _render_node, and propagate_status. Handles lineage context, JSON/markdown parsing, 2–7 subtask clamping/padding, hierarchical IDs, and ASCII tree rendering.
Task Model & Storage
codeframe/core/tasks.py, codeframe/core/workspace.py
Added Task fields: parent_id, lineage, is_leaf, hierarchical_id. Updated create, get, list_tasks, and row conversion to persist/return these fields (lineage JSON, is_leaf int). Workspace schema migration adds corresponding columns with idempotent checks. Also enriches generated tasks with complexity, estimated_hours, uncertainty and resolves depends_on_titles → task IDs.
Context Integration
codeframe/core/context_packager.py
Appends a "Task Lineage" section to packed prompts when a task has lineage, joining ancestor entries with → and placing the section before verification gates.
Tests — Task Tree & Schema
tests/core/test_task_tree.py, tests/core/test_task_tree_schema.py
New comprehensive tests covering classification, decomposition (JSON and markdown-wrapped), recursive generation, flattening to workspace (hierarchical IDs, parent links), ASCII display, status propagation, and schema persistence/backward-compatibility for new fields.
Tests — CLI, Rich Generation, Context
tests/cli/test_tasks_tree_cli.py, tests/core/test_rich_task_generation.py, tests/core/test_context_packager.py
CLI tests for --recursive and --max-depth, and paths for recursive vs non-recursive generation. Rich generation tests for complexity/estimated_hours/uncertainty/files/dependencies handling and edge cases. Context packager tests assert lineage section presence and ordering.

Sequence Diagrams

sequenceDiagram
    participant CLI as "tasks_generate CLI"
    participant TreeGen as "generate_task_tree"
    participant Classify as "classify_task"
    participant Provider as "LLM Provider"
    participant Decompose as "decompose_task"
    participant Flatten as "flatten_task_tree"
    participant Workspace as "Workspace DB"

    CLI->>TreeGen: Invoke with PRD description, lineage, max_depth
    TreeGen->>Classify: Ask: atomic or composite? (includes lineage)
    Classify->>Provider: LLM prompt (with lineage)
    Provider-->>Classify: "atomic" / "composite"
    alt atomic or depth limit reached
        Classify-->>TreeGen: Return leaf node
    else composite
        TreeGen->>Decompose: Request 2–7 subtasks (with lineage)
        Decompose->>Provider: LLM prompt (with lineage)
        Provider-->>Decompose: Subtasks (JSON / markdown)
        Decompose-->>TreeGen: Parsed subtasks
        loop per subtask
            TreeGen->>TreeGen: Recurse (lineage + parent, depth+1)
        end
    end
    TreeGen->>Flatten: Return nested tree to flatten
    Flatten->>Workspace: Create task rows (parent_id, lineage, hierarchical_id)
    Workspace-->>Flatten: Persisted task IDs
    Flatten-->>CLI: Completed
Loading
sequenceDiagram
    participant CLI as "tasks_tree CLI"
    participant Workspace as "Workspace DB"
    participant Display as "display_task_tree"
    participant Render as "_render_node"
    participant Console as "Console Output"

    CLI->>Display: Request tree for workspace
    Display->>Workspace: Load all tasks
    Workspace-->>Display: Return task list
    Display->>Display: Build parent→children map, sort roots
    loop per root
        Display->>Render: Render node line (status, kind, hierarchical id)
        Render-->>Console: Emit line
        loop per child
            Render->>Render: Recurse with indent
            Render-->>Console: Emit child line
        end
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰
I nibble ideas into branch and root,
Tasks unfold like carrots, crisp and cute,
Lineage guides each hop and tiny part,
Leaves turn done when children play their part,
A recursive hop — the rabbit cheers, heart-smart!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main changes: richer task generation (metadata fields) and recursive decomposition as a new feature.
Linked Issues check ✅ Passed The PR implementation fully addresses all coding requirements from issue #420: recursive decomposition engine, task tree structure, rich metadata, CLI commands, status propagation, lineage tracking, and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes align with the stated objectives; no out-of-scope modifications detected beyond the recursive decomposition, task tree, and associated infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 98.73% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/issue-420-richer-task-generation
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
codeframe/cli/app.py (1)

1597-1616: ⚠️ Potential issue | 🟠 Major

Handle --recursive and --no-llm as mutually exclusive.

Line 1597 skips LLM validation when --no-llm is set, but Line 1601 still executes the LLM-only recursive path. cf tasks generate --recursive --no-llm currently has contradictory behavior and can fail with a confusing provider error path.

💡 Suggested fix
-        if not no_llm:
+        if recursive and no_llm:
+            console.print("[red]Error:[/red] --recursive cannot be combined with --no-llm")
+            raise typer.Exit(1)
+
+        if not no_llm:
             from codeframe.cli.validators import require_anthropic_api_key
             require_anthropic_api_key()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 1597 - 1616, The recursive branch
currently runs LLM-only logic even when --no-llm is passed; make the flags
mutually exclusive by validating them before using LLM functions: add a check
that if recursive and no_llm are both true, print an error (or raise/exit) and
abort, or change the recursive condition to "if recursive and not no_llm" so the
code only calls get_provider(), generate_task_tree, and flatten_task_tree when
an LLM is allowed; also ensure require_anthropic_api_key() is only called when
an LLM will actually be used (i.e., when not no_llm).
🧹 Nitpick comments (1)
tests/cli/test_tasks_tree_cli.py (1)

71-77: Assert result.exit_code in the happy-path CLI tests.

Right now both tests can pass even if the command crashes after calling the mocked function. An exit-code assertion will make the behavior check meaningful and prevent false positives.

Proposed fix
             result = runner.invoke(
                 app,
                 ["tasks", "generate", "--recursive", "-w", str(tmp_path)],
             )

+            assert result.exit_code == 0, result.output
             mock_gen_tree.assert_called_once()
             mock_flatten.assert_called_once()
@@
             result = runner.invoke(
                 app,
                 ["tasks", "generate", "-w", str(tmp_path)],
             )

+            assert result.exit_code == 0, result.output
             mock_gen.assert_called_once()

Also applies to: 104-109

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/cli/test_tasks_tree_cli.py` around lines 71 - 77, Add an explicit
exit-code assertion after the CLI invocation to ensure the command completed
successfully: after the runner.invoke(app, ["tasks", "generate", "--recursive",
"-w", str(tmp_path)]) call, assert result.exit_code == 0 (do the same for the
other similar test later in the file that also calls runner.invoke and asserts
mock_gen_tree/mock_flatten). This ensures the test fails if the CLI crashes even
when the mocks were called.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/core/task_tree.py`:
- Around line 26-29: The recursive decomposition currently only requests and
persists 'title'/'description' (DECOMPOSE_SYSTEM_PROMPT and the code paths that
parse/save decomposed children), losing richer task metadata; update the
decomposition prompt (DECOMPOSE_SYSTEM_PROMPT) to require the full task schema
(complexity, estimate, dependencies, files, acceptance_criteria, etc.) and
modify the recursive parse/save logic (the functions that parse the
decomposition response and the code that creates/persists child nodes in
TaskTree) to accept and persist all those fields instead of only
title/description/tree fields so downstream features receive full metadata.
- Around line 396-400: The parent status update in propagate_status mutates
state via task_module.update_status(parent.id) but doesn't emit the required
audit event; after the successful change where new_status != parent.status, call
the core events API (import core.events) to emit a status-transition event
(e.g., core.events.emit_status_transition or the project's equivalent) with the
task id, old status (parent.status), new_status and any workspace/context so all
auto-propagated transitions are recorded; ensure the emit happens only when
update_status is invoked and include the same identifiers used in
propagate_status and task_module.update_status.
- Around line 305-308: The sort key currently treats hierarchical_id as a string
so "1.10" sorts before "1.2"; change the key used when sorting children_map
entries so it converts t.hierarchical_id into a tuple of integers (e.g.,
tuple(map(int, t.hierarchical_id.split('.')))) and fall back to t.title or a
safe tuple when hierarchical_id is None or contains non-numeric parts; update
the sort call on children_map (the lambda at the children_map[pid].sort(...)
line) to parse and return the numeric tuple for proper numeric hierarchical
ordering.

In `@codeframe/core/tasks.py`:
- Around line 559-561: The task creation currently only persists
complexity_score, estimated_hours, and uncertainty_level from task_data and
flattens/loses other structured metadata; update the Task construction (where
complexity_score, estimated_hours, uncertainty_level are set) to also assign
files_to_modify=task_data.get("files_to_modify"),
files_to_create=task_data.get("files_to_create"), and
acceptance_criteria=task_data.get("acceptance_criteria") instead of folding them
into description, and apply the same fix to the other parsing/creation sites
referenced (the blocks around lines 592-599 and 643-655) so the structured
metadata survives parsing and downstream callers can consume those fields.
- Around line 631-637: The conversions for task["complexity"] and
task["estimated_hours"] are unsafe for untrusted LLM output; wrap the int() and
float() conversions in try/except (catch ValueError and TypeError), clamp valid
parsed values as before (complexity -> max(1,min(5,int(...))), estimated_hours
-> max(0.1,float(...))) and on parse failure log a warning and set the field to
None or a safe default so generate_from_prd() does not treat a single malformed
value as a hard failure; update the code around the task dict handling where
complexity and estimated_hours are parsed to implement this defensive parsing
and logging.
- Around line 565-571: The current resolution loop silently drops dependency
titles that don't exactly match and never refreshes created_tasks with the
updated objects returned by update_depends_on; change the logic in the block
that builds title_to_id and iterates over tasks_data/created_tasks so that it
(1) detects and surfaces missing or duplicate titles from
tasks_data.get("depends_on_titles") (raise an exception or return a clear
error/log instead of silently skipping) and (2) assigns the result of
update_depends_on(workspace, task.id, dep_ids) back into created_tasks (i.e.,
replace the task in created_tasks with the returned Task) so callers receive
up-to-date depends_on values; retain references to title_to_id, created_tasks,
update_depends_on, tasks_data, and depends_on_titles when making these changes.

In `@codeframe/core/workspace.py`:
- Around line 424-435: The upgrade path in _ensure_schema_upgrades() must add
the github_issue_number column so tasks.py's unconditional SELECT won't fail;
add a check like the existing ones for "github_issue_number" on the tasks table
and run an ALTER TABLE tasks ADD COLUMN github_issue_number TEXT (or INTEGER)
DEFAULT NULL followed by conn.commit(); update the block that checks
task_columns (in codeframe/core/workspace.py, the function
_ensure_schema_upgrades) to include this ALTER and ensure the schema is present
before any reads from codeframe/core/tasks.py.

In `@tests/core/test_context_packager.py`:
- Around line 465-542: The module tests/core/test_context_packager.py contains
new v2 lineage tests (e.g. TestLineageContext and its methods like
test_context_packager_includes_lineage) but lacks the module-level v2 marker;
add an import for pytest and set pytestmark = pytest.mark.v2 at the top of the
file so all tests in this module are marked v2 per the repo testing rules.

In `@tests/core/test_rich_task_generation.py`:
- Around line 10-13: Remove the unused import PrdRecord from the test file to
satisfy the linter: edit the imports at the top of
tests/core/test_rich_task_generation.py and delete the reference to PrdRecord so
only used symbols (MockProvider, tasks, create_or_load_workspace) remain; ensure
no other references to PrdRecord exist in the file before committing.

---

Outside diff comments:
In `@codeframe/cli/app.py`:
- Around line 1597-1616: The recursive branch currently runs LLM-only logic even
when --no-llm is passed; make the flags mutually exclusive by validating them
before using LLM functions: add a check that if recursive and no_llm are both
true, print an error (or raise/exit) and abort, or change the recursive
condition to "if recursive and not no_llm" so the code only calls
get_provider(), generate_task_tree, and flatten_task_tree when an LLM is
allowed; also ensure require_anthropic_api_key() is only called when an LLM will
actually be used (i.e., when not no_llm).

---

Nitpick comments:
In `@tests/cli/test_tasks_tree_cli.py`:
- Around line 71-77: Add an explicit exit-code assertion after the CLI
invocation to ensure the command completed successfully: after the
runner.invoke(app, ["tasks", "generate", "--recursive", "-w", str(tmp_path)])
call, assert result.exit_code == 0 (do the same for the other similar test later
in the file that also calls runner.invoke and asserts
mock_gen_tree/mock_flatten). This ensures the test fails if the CLI crashes even
when the mocks were called.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0c296c73-8567-4445-9b9a-c4367ed4656a

📥 Commits

Reviewing files that changed from the base of the PR and between cf3fa89 and 6dd347a.

📒 Files selected for processing (10)
  • codeframe/cli/app.py
  • codeframe/core/context_packager.py
  • codeframe/core/task_tree.py
  • codeframe/core/tasks.py
  • codeframe/core/workspace.py
  • tests/cli/test_tasks_tree_cli.py
  • tests/core/test_context_packager.py
  • tests/core/test_rich_task_generation.py
  • tests/core/test_task_tree.py
  • tests/core/test_task_tree_schema.py

Comment on lines +26 to +29
DECOMPOSE_SYSTEM_PROMPT = (
"Break this task into 2-7 concrete subtasks. Each should be actionable and "
"testable. Return a JSON array of objects with 'title' and 'description' fields."
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Recursive flow currently drops rich task metadata.

The recursive decomposition contract only captures title/description, and Line 254-263 persists only those plus tree fields. That means recursive-generated tasks miss richer generation data (e.g., complexity/estimate/dependencies/files/acceptance criteria), so downstream features won’t get populated in this path.

Also applies to: 121-160, 254-263

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/task_tree.py` around lines 26 - 29, The recursive
decomposition currently only requests and persists 'title'/'description'
(DECOMPOSE_SYSTEM_PROMPT and the code paths that parse/save decomposed
children), losing richer task metadata; update the decomposition prompt
(DECOMPOSE_SYSTEM_PROMPT) to require the full task schema (complexity, estimate,
dependencies, files, acceptance_criteria, etc.) and modify the recursive
parse/save logic (the functions that parse the decomposition response and the
code that creates/persists child nodes in TaskTree) to accept and persist all
those fields instead of only title/description/tree fields so downstream
features receive full metadata.

Comment on lines +305 to +308
for pid in children_map:
children_map[pid].sort(
key=lambda t: t.hierarchical_id or t.title
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort key for hierarchical_id is lexicographic, not numeric.

Line 305-308 will order 1.10 before 1.2. This breaks tree display ordering once siblings exceed 9.

💡 Suggested fix
+def _hierarchical_sort_key(task) -> tuple:
+    hid = task.hierarchical_id
+    if hid:
+        parts = []
+        for p in str(hid).split("."):
+            parts.append(int(p) if p.isdigit() else p)
+        return (0, parts)
+    return (1, task.title.lower())
+
 # Sort children by hierarchical_id if available, else by title
 for pid in children_map:
-    children_map[pid].sort(
-        key=lambda t: t.hierarchical_id or t.title
-    )
+    children_map[pid].sort(key=_hierarchical_sort_key)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for pid in children_map:
children_map[pid].sort(
key=lambda t: t.hierarchical_id or t.title
)
def _hierarchical_sort_key(task) -> tuple:
hid = task.hierarchical_id
if hid:
parts = []
for p in str(hid).split("."):
parts.append(int(p) if p.isdigit() else p)
return (0, parts)
return (1, task.title.lower())
# Sort children by hierarchical_id if available, else by title
for pid in children_map:
children_map[pid].sort(key=_hierarchical_sort_key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/task_tree.py` around lines 305 - 308, The sort key currently
treats hierarchical_id as a string so "1.10" sorts before "1.2"; change the key
used when sorting children_map entries so it converts t.hierarchical_id into a
tuple of integers (e.g., tuple(map(int, t.hierarchical_id.split('.')))) and fall
back to t.title or a safe tuple when hierarchical_id is None or contains
non-numeric parts; update the sort call on children_map (the lambda at the
children_map[pid].sort(...) line) to parse and return the numeric tuple for
proper numeric hierarchical ordering.

Comment on lines +396 to +400
if new_status and new_status != parent.status:
task_module.update_status(workspace, parent.id, new_status)

# Recursively propagate upward
propagate_status(workspace, parent.id)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Emit events when propagated parent status changes.

Line 397 mutates parent task state but does not emit a status-transition event, so audit/observability is incomplete for auto-propagated transitions.

💡 Suggested fix
+from codeframe.core.events import emit_for_workspace, EventType
@@
     if new_status and new_status != parent.status:
+        old_status = parent.status
         task_module.update_status(workspace, parent.id, new_status)
+        emit_for_workspace(
+            workspace,
+            EventType.TASK_STATUS_CHANGED,
+            {
+                "task_id": parent.id,
+                "old_status": old_status.value,
+                "new_status": new_status.value,
+            },
+            print_event=False,
+        )

As per coding guidelines codeframe/core/**/*.py: “All core modules must emit events for state transitions via core/events.py for audit and observability”.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if new_status and new_status != parent.status:
task_module.update_status(workspace, parent.id, new_status)
# Recursively propagate upward
propagate_status(workspace, parent.id)
from codeframe.core.events import emit_for_workspace, EventType
if new_status and new_status != parent.status:
old_status = parent.status
task_module.update_status(workspace, parent.id, new_status)
emit_for_workspace(
workspace,
EventType.TASK_STATUS_CHANGED,
{
"task_id": parent.id,
"old_status": old_status.value,
"new_status": new_status.value,
},
print_event=False,
)
# Recursively propagate upward
propagate_status(workspace, parent.id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/task_tree.py` around lines 396 - 400, The parent status update
in propagate_status mutates state via task_module.update_status(parent.id) but
doesn't emit the required audit event; after the successful change where
new_status != parent.status, call the core events API (import core.events) to
emit a status-transition event (e.g., core.events.emit_status_transition or the
project's equivalent) with the task id, old status (parent.status), new_status
and any workspace/context so all auto-propagated transitions are recorded;
ensure the emit happens only when update_status is invoked and include the same
identifiers used in propagate_status and task_module.update_status.

Comment thread codeframe/core/tasks.py
Comment on lines +559 to +561
complexity_score=task_data.get("complexity"),
estimated_hours=task_data.get("estimated_hours"),
uncertainty_level=task_data.get("uncertainty"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The richer metadata contract is still lossy.

This flow only persists complexity/hours/uncertainty. files_to_modify gets flattened into description, and files_to_create / acceptance_criteria never survive parsing at all, so downstream callers still cannot consume the structured metadata this PR is supposed to add.

Also applies to: 592-599, 643-655

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/tasks.py` around lines 559 - 561, The task creation currently
only persists complexity_score, estimated_hours, and uncertainty_level from
task_data and flattens/loses other structured metadata; update the Task
construction (where complexity_score, estimated_hours, uncertainty_level are
set) to also assign files_to_modify=task_data.get("files_to_modify"),
files_to_create=task_data.get("files_to_create"), and
acceptance_criteria=task_data.get("acceptance_criteria") instead of folding them
into description, and apply the same fix to the other parsing/creation sites
referenced (the blocks around lines 592-599 and 643-655) so the structured
metadata survives parsing and downstream callers can consume those fields.

Comment thread codeframe/core/tasks.py
Comment on lines +565 to +571
# Resolve title-based dependencies to task IDs
title_to_id = {t.title: t.id for t in created_tasks}
for task_data, task in zip(tasks_data, created_tasks):
dep_titles = task_data.get("depends_on_titles", [])
dep_ids = [title_to_id[t] for t in dep_titles if t in title_to_id]
if dep_ids:
update_depends_on(workspace, task.id, dep_ids)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Dependency resolution should not silently weaken the graph or return stale tasks.

A typo, truncation mismatch, or duplicate title just disappears here because only exact matches survive. Also, update_depends_on() returns the updated Task, but created_tasks is never refreshed, so callers get stale depends_on values back even when resolution succeeds.

Proposed fix
-    title_to_id = {t.title: t.id for t in created_tasks}
-    for task_data, task in zip(tasks_data, created_tasks):
+    title_to_id = {t.title: t.id for t in created_tasks}
+    if len(title_to_id) != len(created_tasks):
+        raise ValueError("Duplicate task titles prevent dependency resolution")
+
+    for i, (task_data, task) in enumerate(zip(tasks_data, created_tasks)):
         dep_titles = task_data.get("depends_on_titles", [])
-        dep_ids = [title_to_id[t] for t in dep_titles if t in title_to_id]
+        unresolved = [title for title in dep_titles if title not in title_to_id]
+        if unresolved:
+            raise ValueError(
+                f"Unresolved dependencies for {task.title}: {', '.join(unresolved)}"
+            )
+
+        dep_ids = [title_to_id[title] for title in dep_titles]
         if dep_ids:
-            update_depends_on(workspace, task.id, dep_ids)
+            created_tasks[i] = update_depends_on(workspace, task.id, dep_ids)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/tasks.py` around lines 565 - 571, The current resolution loop
silently drops dependency titles that don't exactly match and never refreshes
created_tasks with the updated objects returned by update_depends_on; change the
logic in the block that builds title_to_id and iterates over
tasks_data/created_tasks so that it (1) detects and surfaces missing or
duplicate titles from tasks_data.get("depends_on_titles") (raise an exception or
return a clear error/log instead of silently skipping) and (2) assigns the
result of update_depends_on(workspace, task.id, dep_ids) back into created_tasks
(i.e., replace the task in created_tasks with the returned Task) so callers
receive up-to-date depends_on values; retain references to title_to_id,
created_tasks, update_depends_on, tasks_data, and depends_on_titles when making
these changes.

Comment thread codeframe/core/tasks.py
Comment on lines +631 to +637
complexity = task.get("complexity")
if complexity is not None:
complexity = max(1, min(5, int(complexity)))

estimated_hours = task.get("estimated_hours")
if estimated_hours is not None:
estimated_hours = max(0.1, float(estimated_hours))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle malformed numeric metadata defensively.

These values come from untrusted LLM output. int() / float() can raise here, and generate_from_prd() currently treats ValueError as a hard config failure, so one bad "complexity" or "estimated_hours" field can abort the entire generation path instead of degrading gracefully.

Proposed fix
-        complexity = task.get("complexity")
-        if complexity is not None:
-            complexity = max(1, min(5, int(complexity)))
+        complexity = task.get("complexity")
+        try:
+            complexity = (
+                max(1, min(5, int(complexity)))
+                if complexity is not None
+                else None
+            )
+        except (TypeError, ValueError):
+            complexity = None

-        estimated_hours = task.get("estimated_hours")
-        if estimated_hours is not None:
-            estimated_hours = max(0.1, float(estimated_hours))
+        estimated_hours = task.get("estimated_hours")
+        try:
+            estimated_hours = (
+                max(0.1, float(estimated_hours))
+                if estimated_hours is not None
+                else None
+            )
+        except (TypeError, ValueError):
+            estimated_hours = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/tasks.py` around lines 631 - 637, The conversions for
task["complexity"] and task["estimated_hours"] are unsafe for untrusted LLM
output; wrap the int() and float() conversions in try/except (catch ValueError
and TypeError), clamp valid parsed values as before (complexity ->
max(1,min(5,int(...))), estimated_hours -> max(0.1,float(...))) and on parse
failure log a warning and set the field to None or a safe default so
generate_from_prd() does not treat a single malformed value as a hard failure;
update the code around the task dict handling where complexity and
estimated_hours are parsed to implement this defensive parsing and logging.

Comment on lines +424 to +435
if "parent_id" not in task_columns:
cursor.execute("ALTER TABLE tasks ADD COLUMN parent_id TEXT")
conn.commit()
if "lineage" not in task_columns:
cursor.execute("ALTER TABLE tasks ADD COLUMN lineage TEXT DEFAULT '[]'")
conn.commit()
if "is_leaf" not in task_columns:
cursor.execute("ALTER TABLE tasks ADD COLUMN is_leaf INTEGER DEFAULT 1")
conn.commit()
if "hierarchical_id" not in task_columns:
cursor.execute("ALTER TABLE tasks ADD COLUMN hierarchical_id TEXT")
conn.commit()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Backfill github_issue_number in the upgrade path before shipping this schema change.

codeframe/core/tasks.py now selects github_issue_number unconditionally, but _ensure_schema_upgrades() still never adds that column for pre-existing workspaces. On an older DB, the first task read will fail with sqlite3.OperationalError: no such column: github_issue_number.

Proposed fix
         if "uncertainty_level" not in task_columns:
             cursor.execute("ALTER TABLE tasks ADD COLUMN uncertainty_level TEXT")
             conn.commit()
+        if "github_issue_number" not in task_columns:
+            cursor.execute("ALTER TABLE tasks ADD COLUMN github_issue_number INTEGER")
+            conn.commit()
         if "parent_id" not in task_columns:
             cursor.execute("ALTER TABLE tasks ADD COLUMN parent_id TEXT")
             conn.commit()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/workspace.py` around lines 424 - 435, The upgrade path in
_ensure_schema_upgrades() must add the github_issue_number column so tasks.py's
unconditional SELECT won't fail; add a check like the existing ones for
"github_issue_number" on the tasks table and run an ALTER TABLE tasks ADD COLUMN
github_issue_number TEXT (or INTEGER) DEFAULT NULL followed by conn.commit();
update the block that checks task_columns (in codeframe/core/workspace.py, the
function _ensure_schema_upgrades) to include this ALTER and ensure the schema is
present before any reads from codeframe/core/tasks.py.

Comment on lines +465 to +542
class TestLineageContext:
"""Tests for lineage inclusion in build() prompt."""

def test_context_packager_includes_lineage(self, mock_workspace):
"""Task with lineage should have 'Task Lineage' section in prompt."""
ctx = MagicMock(spec=TaskContext)
ctx.task = MagicMock()
ctx.task.lineage = ["Build app", "Authentication module"]
ctx.to_prompt_context.return_value = (
"## Task\n**Title:** Implement JWT\n**Description:** Add tokens\n"
)
ctx.relevant_files = []

with patch("codeframe.core.context_packager.ContextLoader") as MockLoader:
MockLoader.return_value.load.return_value = ctx

packager = TaskContextPackager(mock_workspace)
result = packager.build("task-1")

assert "Task Lineage" in result.prompt
assert "Build app" in result.prompt
assert "Authentication module" in result.prompt

def test_context_packager_no_lineage(self, mock_workspace):
"""Task without lineage should not have 'Task Lineage' section."""
ctx = MagicMock(spec=TaskContext)
ctx.task = MagicMock()
ctx.task.lineage = []
ctx.to_prompt_context.return_value = (
"## Task\n**Title:** Simple task\n**Description:** Do it\n"
)
ctx.relevant_files = []

with patch("codeframe.core.context_packager.ContextLoader") as MockLoader:
MockLoader.return_value.load.return_value = ctx

packager = TaskContextPackager(mock_workspace)
result = packager.build("task-1")

assert "Task Lineage" not in result.prompt

def test_context_packager_lineage_missing_attribute(self, mock_workspace):
"""Task without lineage attribute should not have 'Task Lineage' section."""
ctx = MagicMock(spec=TaskContext)
ctx.task = MagicMock(spec=["title", "description", "id"])
# No lineage attribute on task
ctx.to_prompt_context.return_value = (
"## Task\n**Title:** Old task\n**Description:** Legacy\n"
)
ctx.relevant_files = []

with patch("codeframe.core.context_packager.ContextLoader") as MockLoader:
MockLoader.return_value.load.return_value = ctx

packager = TaskContextPackager(mock_workspace)
result = packager.build("task-1")

assert "Task Lineage" not in result.prompt

def test_lineage_appears_before_gates(self, mock_workspace):
"""Lineage section should appear before Verification Gates."""
ctx = MagicMock(spec=TaskContext)
ctx.task = MagicMock()
ctx.task.lineage = ["Parent task"]
ctx.to_prompt_context.return_value = (
"## Task\n**Title:** Child task\n"
)
ctx.relevant_files = []

with patch("codeframe.core.context_packager.ContextLoader") as MockLoader:
MockLoader.return_value.load.return_value = ctx

packager = TaskContextPackager(mock_workspace)
result = packager.build("task-1")

lineage_pos = result.prompt.index("Task Lineage")
gates_pos = result.prompt.index("Verification Gates")
assert lineage_pos < gates_pos

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the missing v2 marker for this module.

These new lineage tests cover v2 functionality, but the file still lacks a module-level pytestmark = pytest.mark.v2, so they won't follow the repo's test-selection rules.

Proposed fix
 import pytest
 from pathlib import Path
 from unittest.mock import MagicMock, patch

+pytestmark = pytest.mark.v2
+
 from codeframe.core.context_packager import TaskContextPackager, PackagedContext
 from codeframe.core.context import TaskContext, FileInfo
 from codeframe.core.adapters.agent_adapter import AgentContext
As per coding guidelines "`tests/**/*.py`: New v2 Python tests must be marked with `@pytest.mark.v2` decorator or `pytestmark = pytest.mark.v2`."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/core/test_context_packager.py` around lines 465 - 542, The module
tests/core/test_context_packager.py contains new v2 lineage tests (e.g.
TestLineageContext and its methods like test_context_packager_includes_lineage)
but lacks the module-level v2 marker; add an import for pytest and set
pytestmark = pytest.mark.v2 at the top of the file so all tests in this module
are marked v2 per the repo testing rules.

Comment thread tests/core/test_rich_task_generation.py
@frankbria frankbria merged commit 1a15a27 into main Mar 15, 2026
28 of 29 checks passed
@claude

claude Bot commented Mar 15, 2026

Copy link
Copy Markdown

PR Review: feat(tasks): richer task generation with recursive decomposition

This is a substantial and well-structured PR. The module separation is clean (task_tree.py is properly headless), test coverage is solid (51 tests), and the schema migration approach is consistent with the existing pattern. A few critical issues to address before merging.


Critical Issues

1. generate_task_tree receives the full PRD as description, not a task title

In cli/app.py, the recursive path calls generate_task_tree(provider, prd_record.content, lineage=[], depth=0, max_depth=max_depth). The function calls classify_task on this string, asking the LLM 'is this atomic or composite?' - but it is a full PRD, not a task description. The LLM will almost certainly classify it as composite and decompose_task will try to break the entire PRD into 2-7 subtasks, which is not the intended behavior. The correct flow: first extract top-level tasks from the PRD (as _generate_tasks_with_llm does today), then recursively decompose each task. Without this fix, the recursive path produces a fundamentally different structure than what the feature description implies.

2. propagate_status is defined but never called

The PR description says 'Children done, parent auto-completes' but propagate_status is not wired into runtime.py or state_machine.py. Status propagation will never fire during normal agent execution. Either wire it in (e.g., call it from complete_run() / fail_run() after update_status) or document clearly that it is a utility for future integration.

3. LLM call explosion with no user warning

With max_depth=3 and composite tasks generating up to 7 subtasks, worst-case is approximately 800 LLM calls for a single 'cf tasks generate --recursive'. The user gets no cost estimate or call-count warning. At max_depth=5 (the CLI limit), this becomes catastrophically expensive. At minimum, print an estimated call-count warning or require an explicit --confirm flag.


Moderate Issues

4. _row_to_task positional indexing is fragile

The new fields use positional row[14], row[15], row[16], row[17] with len(row) > N guards. This already existed for github_issue_number at row[13] and is a known maintenance risk - any query path reusing _row_to_task without the new columns selected will silently return wrong data. Consider using sqlite3.Row factory with named column access (row['parent_id']) to make this robust across future refactors.

5. Hierarchical ID sorting uses string comparison

children_map[pid].sort(key=lambda t: t.hierarchical_id or t.title) - String sort puts '1.10' before '1.2'. Fix: key=lambda t: [int(p) for p in t.hierarchical_id.split('.')] if t.hierarchical_id else [999]

6. files_to_modify is appended to description rather than stored

The LLM is asked to provide files_to_modify but the values are concatenated into the description string. This pollutes the description field and makes the data unusable programmatically. Either add a dedicated model field, or drop it from the prompt until it can be stored cleanly.


Minor Observations

  • test_display_empty_workspace has a dead-code 'or' branch: display_task_tree always returns 'No tasks found.' so the empty-string check is unreachable.
  • The test helper _update_tree_fields writes raw SQL inside a test file. Acceptable short-term, but consider moving to a tasks.py internal helper if more tests need it.
  • classify_task uses Purpose.PLANNING (Sonnet). Since it needs only a single-word answer, Purpose.GENERATION (Haiku) would be faster and cheaper.

What Is Working Well

  • Clean headless module design: task_tree.py has no FastAPI imports, consistent with architecture rules.
  • Backward-compatible schema migration follows the existing _ensure_schema_upgrades pattern correctly.
  • Rich metadata fields (complexity_score, estimated_hours, uncertainty_level, depends_on_titles) in the standard LLM prompt are immediately useful and non-breaking.
  • _parse_subtasks handles both raw JSON and markdown-wrapped JSON - good defensive parsing.
  • 51 tests with solid coverage of edge cases (clamping, fallbacks, multi-level propagation).

Two blockers before merge: (1) the PRD-as-description design bug in generate_task_tree, and (2) propagate_status not being wired in. The other items can be tracked as follow-up issues.

Repository owner deleted a comment from claude Bot Mar 15, 2026
@frankbria frankbria deleted the feature/issue-420-richer-task-generation branch March 24, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 5] Richer Task Generation: Dependencies, Complexity, and File Scope

1 participant